Application of Support Vector Machines to Melissopalynological Data for Honey Classification
نویسندگان
چکیده
In this paper, the authors address the problem of the discrimination of geographical origin and the selection of marker species of honeys using Support Vector Machines and z-scores. The methodology is based on the elaboration of palynological data with statistical learning methodologies. This innovative solution provides a simple yet powerful tool to detect the origin of honey samples. In case of honeys from Sorrento Peninsula, the discrimination from other Italian honeys is obtained with high accuracy. tion, such as Protected Designation of Origin (PDO), can give extra value to European honey. These laws protect a regional food, vegetable or fruit defining specific and objective procedures to determine the quality standard of the product and to verify the link to its geographical origin. At present, the procedure to verify geographical origin of honey is not well established; some attempts have been done to overcome this situation, and a promising field of research relies on the application of statistical methods on data about pollen content of honeys (Aronne, 2010). Floral honey always contains numerous pollen grains mainly from the plant species DOI: 10.4018/jaeis.2010070105 86 International Journal of Agricultural and Environmental Information Systems, 1(2), 85-94, July-December 2010 Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. foraged by honey bees. Pollen analysis of honey, namely melissopalynology, is of great utility to determine and control both botanical and geographical origin. The determination of geographical origin is based on the entire pollen spectrum being consistent with the flora of a particular region and with any reference spectra or descriptions in the literature (Louveaux, 1978; Herrero, 2001; Persano Oddo, 2004; Persano Oddo, 2007; Aronne, 2010). Analysis of the geographical origin of honey is based on the assumption that the selection of honey species made by bees is influenced by the peculiarity of local vegetation. Therefore, the palynological component of a honey, if correctly analysed, should provide information on the foraging area. This research topic assumes considerable importance when its aim is to safeguard consumer interests and to protect honest producers of honeys labelled with the indication of geographical origin (Aronne, 2008). Although some computer-aided methods for the classification of honeys have been developed (Battesti, 1992; Scala, 2004a; Scala, 2004b; Aronne, 2008a; Aronne, 2008b; Aronne, 2008c; Aronne, 2010) at the moment the identification of geographical origin depends mainly on the experience and the knowledge of the palynologists who are asked to compare results from specific samples with a hypothetical pollen spectrum from honey producible in the same geographical area. It is therefore evident that elaboration of melissopalynological data requires precise, sensitive analytical tools which go beyond the subjective evaluation, providing the means to correlate data and information which are otherwise elusive. Starting from the assumption that palynological data contain valuable information on the vegetation characteristics of the area in which bees have foraged, we used analysis tools and techniques that have been successfully applied to other fields, such as genetics, agriculture, economics and computer science (Guarracino, 2008; Mucherino, 2009). We believe that data mining techniques and concepts from statistical learning theory could provide the methodology enabling analysis of the pollen content of honeys. This analysis is determined by data models, their analysis, implementation and the use of specific algorithms. In this paper we report the application of these tools to melissopalynological data; our specific purpose was to test and define a new methodological proposal to trace the geographical origin of honeys. To this extent, we used the results of palynological analysis of chestnut honeys (honey produced from the nectar of Castanea sativa Mill.) from the Sorrento Peninsula (Southern Italy) and from other areas in Italy. According to Von der Ohe et al. (2004), a honey can be defined “chestnut honey” if at least 86% of its pollen grains are from Castanea sativa. As a consequence, in this kind of honey only the remaining 14% of the pollen component is responsible for differences between samples and it is quite difficult to evaluate the geographical origin of the samples. The aim of our work was to find a mathematical model able to distinguish the chestnut honeys produced in the Sorrento Peninsula from those produced elsewhere. In the following sections, starting from initial statistical evaluation of the data, we report a synthesis of the working phases. In next section, data used for experiments are described. In section 3, data preparation is detailed. In section 4, methods used for classification are given. In sections 5 and 6, results regarding classification and variable selection are discussed. Finally, in section 7, conclusions are drawn.
منابع مشابه
A QUADRATIC MARGIN-BASED MODEL FOR WEIGHTING FUZZY CLASSIFICATION RULES INSPIRED BY SUPPORT VECTOR MACHINES
Recently, tuning the weights of the rules in Fuzzy Rule-Base Classification Systems is researched in order to improve the accuracy of classification. In this paper, a margin-based optimization model, inspired by Support Vector Machine classifiers, is proposed to compute these fuzzy rule weights. This approach not only considers both accuracy and generalization criteria in a single objective fu...
متن کاملA comparative study of performance of K-nearest neighbors and support vector machines for classification of groundwater
The aim of this work is to examine the feasibilities of the support vector machines (SVMs) and K-nearest neighbor (K-NN) classifier methods for the classification of an aquifer in the Khuzestan Province, Iran. For this purpose, 17 groundwater quality variables including EC, TDS, turbidity, pH, total hardness, Ca, Mg, total alkalinity, sulfate, nitrate, nitrite, fluoride, phosphate, Fe, Mn, Cu, ...
متن کاملSeparating Well Log Data to Train Support Vector Machines for Lithology Prediction in a Heterogeneous Carbonate Reservoir
The prediction of lithology is necessary in all areas of petroleum engineering. This means that to design a project in any branch of petroleum engineering, the lithology must be well known. Support vector machines (SVM’s) use an analytical approach to classification based on statistical learning theory, the principles of structural risk minimization, and empirical risk minimization. In this res...
متن کاملکاربرد الگوریتمهای دادهکاوی در تفکیک منابع رسوبی حوزۀ آبخیز نوده گناباد
Introduction: Reduction of sediment supply requires the implementation of soil conservation and sediment control programs in the form of watershed management plans. Sediment control programs require identifying the relative importance of sediment sources, their quantitative ascription and identification of critical areas within the watersheds. The sediment source ascription is involves two...
متن کاملApplication of Artificial Neural Networks and Support Vector Machines for carbonate pores size estimation from 3D seismic data
This paper proposes a method for the prediction of pore size values in hydrocarbon reservoirs using 3D seismic data. To this end, an actual carbonate oil field in the south-western part ofIranwas selected. Taking real geological conditions into account, different models of reservoir were constructed for a range of viable pore size values. Seismic surveying was performed next on these models. F...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJAEIS
دوره 1 شماره
صفحات -
تاریخ انتشار 2010